Method, system and apparatus for an output generator for use in the processing of structured documents

ABSTRACT

Embodiments of systems, methods and apparatuses for an output generator for generating output from processed content of a structured document are disclosed. More specifically, embodiments of an output generator may comprise hardware circuitry operable to order data resulting from the transformation of a structured document as it is generated and format this data according to a format of a corresponding output document to generate output corresponding to the output document.

RELATED APPLICATIONS

This application claims a benefit of priority under 35 U.S.C. § 119(e)to U.S. Provisional Patent Application Nos. 60/675,349, by inventorsHoward Tsoi, Daniel Cermak, Richard Trujillo, Trenton Grale, RobertCorley, Bryan Dobbs and Russell Davoli, entitled “Output Generator forUse with System for Creation of Multiple, Hierarchical Documents”, filedon Apr. 27, 2005; 60/675,347, by inventors Daniel Cermak, Howard Tsoi,John Derrick, Richard Trujillo, Udi Kalekin, Bryan Dobbs, Ying Tong,Brendon Cahoon and Jack Matheson, entitled “Transformation Engine forUse with System for Creation of Multiple, Hierarchical Documents”, filedon Apr. 27, 2005; 60/675,167, by inventors Richard Trujillo, BryanDobbs, Rakesh Bhakta, Howard Tsoi, Jack Randall, Howard Liu, YongjianZhou and Daniel Cermak, entitled “Parser for Use with System forCreation of Multiple, Hierarchical Documents”, filed on Apr. 27, 2005and 60/675,115, by inventors John Derrick, Richard Trujillo, DanielCermak, Bryan Dobbs, Howard Liu, Rakesh Bhakta, Udi Kalekin, RussellDavoli, Clifford Hall and Avinash Palaniswamy, entitled “GeneralArchitecture for a System for Creation of Multiple, HierarchicalDocuments”, filed on Apr. 27, 2005 the entire contents of which arehereby expressly incorporated by reference for all purposes.

TECHNICAL FIELD OF THE INVENTION

The invention relates in general to methods, systems and apparatuses forprocessing structured documents, and more particularly, to theefficiently generating output resulting from the processing,transformation or rendering of structured documents.

BACKGROUND OF THE INVENTION

Electronic data, entertainment and communications technologies aregrowing increasingly prevalent with each passing day. In the past, thevast majority of these electronic documents were in a proprietaryformat. In other words, a particular electronic document could only beprocessed or understood by the application that created that document.Up until relatively recently this has not been especially troublesome.

This situation became progressively more problematic with the advent ofnetworking technologies, however. These networking technologies allowedelectronic documents to be communicated between different and varyingdevices, and as these network technologies blossomed, so did user'sdesires to use these networked devices to share electronic data.

Much to the annoyance of many users, however, the proprietary formats ofthe majority of these electronic documents prevented them from beingshared between different platforms: if a document was created by onetype of platform it usually could not be processed, or rendered, byanother type of platform.

To that end, data began to be placed in structured documents. Structureddocuments may be loosely defined as any type of document that adheres toa set of rules. Because the structured document conforms to a set ofrules it enables the cross-platform distribution of data, as anapplication or platform may process or render a structured documentbased on the set of rules, no matter the application that originallycreated the structured document.

The use of structured documents to facilitate the cross-platformdistribution of data is not without its own set of problems, however. Inparticular, in many cases the structured document does not itself definehow the data it contains is to be rendered, for example for presentationto a user. Exacerbating the problem is the size of many of thesestructured documents. To facilitate the organization of data intendedfor generic consumption these structured documents may contain a greatdeal of meta-data, and thus may be larger than similar proprietarydocuments, in some cases up to twenty times larger or more.

In many cases, instructions may be provided for how to transform orrender a particular structured document. For example, one mechanismimplemented as a means to facilitate processing XML is the extensiblestylesheet language (XSL) and stylesheets written using XSL. Stylesheetsmay be written to transform XML documents from one markup definition (or“vocabulary”) defined within XML to another vocabulary, from XML markupto another structured or unstructured document form (such as plain text,word processor, spreadsheet, database, pdf, HTML, etc.), or from anotherstructured or unstructured document form to XML markup. Thus,stylesheets may be used to transform a document's structure from itsoriginal form to a form expected by a given user (output form).

Typically, structured documents are transformed or rendered with one ormore software applications. However, as many definitions for thesestructured languages were designed and implemented without taking intoaccount conciseness or efficiency of parsing and transformation, the useof software applications to transform or render these structureddocuments may be prohibitively inefficient.

Thus, as can be seen, there is a need for methods and systems for anarchitecture for the efficient processing of structured documents andthe generation of output from the processing of these structureddocuments.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings accompanying and forming part of this specification areincluded to depict certain aspects of embodiments of the invention. Aclearer impression of embodiments of the invention, and of thecomponents and operation of systems provided with embodiments of theinvention, will become more readily apparent by referring to theexemplary, and therefore nonlimiting, embodiments illustrated in thedrawings, wherein identical reference numerals designate the samecomponents. Note that the features illustrated in the drawings are notnecessarily drawn to scale.

FIG. 1 depicts an embodiment of an architecture for the implementationof web services.

FIG. 2 depicts an embodiment of an architecture for an output generator.

FIG. 3 depicts one embodiment for the processing of structured documentsutilizing a document processor.

FIG. 4 depicts one embodiment of an architecture for a device for theprocessing of structured documents.

FIG. 5 depicts one embodiment of an architecture for the processing ofstructured documents utilizing an embodiment of the device depicted inFIG. 4.

FIG. 6 depicts an embodiment of the interface between a transformationengine and an output generator.

FIG. 7 depicts one embodiment of communications between a transformationengine and an output generator.

FIG. 8 depicts one embodiment of an order control unit and an outputwalker.

FIG. 9 depicts one embodiment of an output data structure.

FIG. 10 depicts one embodiment of a value extractor.

FIG. 11 depicts one embodiment of a value extractor and an outputformatter.

DETAILED DESCRIPTION

Embodiments of the invention and the various features and advantageousdetails thereof are explained more fully with reference to thenonlimiting embodiments that are illustrated in the accompanyingdrawings and detailed in the following description. Descriptions of wellknown starting materials, processing techniques, components andequipment are omitted so as not to unnecessarily obscure the inventionin detail. Skilled artisans should understand, however, that thedetailed description and the specific examples, while disclosingpreferred embodiments of the invention, are given by way of illustrationonly and not by way of limitation. Various substitutions, modifications,additions or rearrangements within the scope of the underlying inventiveconcept(s) will become apparent to those skilled in the art afterreading this disclosure.

Reference is now made in detail to the exemplary embodiments of theinvention, examples of which are illustrated in the accompanyingdrawings. Wherever possible, the same reference numbers will be usedthroughout the drawings to refer to the same or like parts (elements).

Before describing embodiments of the present invention it may be usefulto describe an exemplary architecture for a web service. Although webservices are known in the art, a description of such an architecture maybe helpful in better explaining the embodiments of the inventiondepicted herein.

FIG. 1 depicts an embodiment of one such architecture for implementing aweb service. Typically, web services provide a standard means ofinteroperating between different software applications running on avariety of platforms and/or frameworks. A web service provider 110 mayprovide a set of web services 112. Each web service 112 may have adescribed interface, such that a requestor may interact with the webservice 112 according to that interface.

For example, a user at a remote machine 120 may wish to use a webservice 112 provided by web service provider 110. To that end the usermay use a requestor agent to communicate message 130 to a service agentassociated with the desired web service 112, where the message is in aformat prescribe by the definition of the interface of the desired webservice 112. In many cases, the definition of the interface describesthe message formats, data types, transport protocols, etc. that are tobe used between a requester agent and a provider agent.

The message 130 may comprise data to be operated on by the requested webservice 112. More particularly, message 130 may comprise a structureddocument and instructions for transforming the structured document. Forexample, message 130 may be a SOAP (e.g. Simple Object Access Protocol)message comprising an eXtensible Markup Language (XML) document and aneXstensible Style Sheet Language Transformation (XSLT) stylesheetassociated with the XML document. It should be noted that, in somecases, transformation instructions (e.g. a Document Type Definition(DTD), schema, or stylesheet) may be embedded in a structured document,for example, either directly or as a pointer. In such cases thetransformation instructions may be extracted from the document beforebeing utilized in any subsequent method or process.

Thus, in some cases the provider agent associated with a particular webservice 112 may receive message 130; web service 112 may process thestructured document of message 130 according to the instructions fortransforming the structured document included in message 130; and theresult 140 of the transformation returned to the requestor agent.

In some cases, many structured documents may be sent to a particular webservice 112 with one set of transformation instructions, so that each ofthese documents may be transformed according to the identical set ofinstructions. Conversely, one structured document may be sent to aparticular web service 112 with multiple sets of transformationinstructions to be applied to the structured document.

Hence, as can be seen from this brief overview of the architecture forimplementing web services 112, it may be highly desired to process thesestructured documents as efficiently as possible such that web services112 may be used on many data sets and large data sets without creating abottleneck during the processing of the structured documents andprocessing resources of web service provider 110 may be effectivelyutilized.

More particularly, after processing or transforming content ofstructured documents it may be necessary to generate an output documentcomprising this processed content, where the output document in one ormore formats similar or dissimilar to the format of the originalstructured document so that an output document may be provided to arequester. This output document may comprise a certain order, in otherwords, the makeup of the output document (e.g. markup, content, etc.)may need to have a certain order if the output document is to be validor usable. Furthermore, it may be desired to generate the outputdocument quickly, as requestors or applications may be awaiting thearrival of such an output document.

In certain cases, however, the transformation or processing of contentof a structured document may occur, and portions of the transformationcompleted, in an order differing from an order corresponding with theoutput document, and the transformation of content corresponding tomultiple structured documents may occur substantially simultaneously.Thus, it is desirable to both associate transformed content with anoutput document (which may, in turn, correspond to a particular originalstructured document) and to assemble transformed or processed contentfor an output document in an order corresponding with an output documentas that content is processed or transformed. By associating processedcontent with an output document, and assembling content for an outputdocument according to an order of the output document substantially asit is generated, output documents may be assembled quickly, and manyoutput documents may be assembled substantially simultaneously.

Attention is now directed to embodiments of systems and methods for anarchitecture for the efficient generation of output associated with atransformation process applied to a structured document. Embodiments ofthe present invention may provide an output generator which compriseshardware circuitry, for example a hardware processing device such as anApplication Specific Integrated Circuit (ASIC), for generating outputfrom processed content of a structured document. In other words,embodiments of the present invention may provide hardware with thecapability to process content resulting from the transformation of thecontent of a structured document to provide an output stream (e.g.substantially without the use of software). This hardware may utilizedata structures storing the resulting content, some of which may reflectthe structure of a desired output document. The content in these datastructures may then be formatted and output according to a desiredstructure, format or encoding for an output document.

More specifically, embodiments of the present invention may provide anoutput generator circuit operable to allocate data structurescorresponding with an output document as a transformation process inapplied to a structured document. Content resulting from thetransformation of the content of a structured document (which will becollectively referred to herein as transformed content) may be placed inthese data structures in an order which does not conform to the desiredorder of the output document. In other words, data from thetransformation process of an original structured document may be placedin these data structures as it is transformed or otherwise processed,resulting in the filling of these data structures in an arbitrary order(e.g. an order that conforms with the processing of the content of thestructured document and not an order of the output document).Consequently, the output generator may ascertain when data correspondingwith the order of the output document has been placed in these datastructures, and as data conforming with the order of the output documentis placed in these data structures it may be formatted according to aformat of a desired output document and output corresponding with anoutput document may be generated substantially as the data correspondingwith that output document is generated and before the generation of allthe transformed content for the output document has been completed.Furthermore, the process of filling data structures associated with anoutput document, formatting this data and outputting portions of theoutput document may occur substantially simultaneously across multipleoutput documents, thus multiple output documents, in multiple formatsmay be created from one or more original structured documentssubstantially in parallel.

Moreover, by allowing dynamic creation of output structures, the fillingof these output structures as the content of a structured document istransformed, and the processing of this transformed content to form anoutput stream corresponding to an output document, the processing of astructured document to create an output document may be madesubstantially more efficient, as the operations of transforming thestructured document can be accomplished without concern to the order ofthese operations or the order desired for an output document. As thetransformed content is generated for the output document, however, itmay be formatted into an output stream according to the order of adesired output document. Thus, portions of an output document resultingfrom a transformation of a structured document may be output even beforethe transformation of the original structured document has completedprocessing.

Embodiments of the output generator of the present invention may have aset of logical units. One of the logical units may be responsible forallocating data structures for an output document and placing data inthese data structures, another logical unit may be operable to traversethese data structures to obtain references to data in the datastructures corresponding with an output document, another logical unitmay locate the transformed content associated with those references andformat the transformed content according to a format for the outputdocument to generate a character stream comprising a portion of anoutput document which may then be encoded by another logical unit into asuitable character encoding format.

One particular embodiment of an output generator is depicted in FIG. 2.Output generator 350 may have one or more interfaces 202 through whichoutput generator 350 can receive a request for the creation of a datastructure in memory 270 corresponding with an output document or contentto place in a data structure corresponding to an output document.Utilizing the data structures in memory 270, output generator 350outputs a character stream corresponding to one or more outputdocuments, or messages associated with the transformation of astructured document, through output interface 204.

Output generator 350 comprises a set of logical units 212, 222, 232,242. Order control unit 212 is operable to receive requests to createdata structures corresponding to an output document in memory 270,allocate these data structures and return a reference or handle to thesedata structures to the requester. Additionally, order control unit 212is operable to receive data to be placed in a data structure in memory270 and place this data in the appropriate data structure in memory 270.Order control unit 212 may signal (e.g. send a command, known as a walkevent) to output walker 222 when data associated with an output documenthas arrive which may be output. (e.g. when data for an output documenthas been placed in a data structure where substantially all transformedcontent for that output document which would precede the data in theoutput document has been, or is in the process of, being output).

Output walker 222 is operable to receive these walk events and traversedata structures in memory 270 to locate data corresponding to the outputdocument which is to be output and pass one or more commands to valueextractor 232 indicating the location of this data in data structures inmemory 270.

Value extractor 232 may locate and retrieve data referenced by commandsfrom output walker 222 and markup the data according to a format of theoutput document (e.g. XML, HTML, text, etc.) to form a character stream.Thus, value extractor 232 may add or remove characters to the datareferenced by command(s) to format the data according to a format forthe output document.

Output formatter 242 may take this character stream and transcode eachof these characters from an internal encoding scheme to an encodingscheme according to a desired character encoding of the output document.Thus, data may arrive at output formatter 242 in an internal encodingscheme and be converted to one of a variety of encoding schemes such asUnicode Transformation Format (UTF), Big Endian, Little Endian,International Standards Organization (ISO) 8859, Windows (Win) 1252,etc. to produce an output stream corresponding to at least a portion ofan output document which may be delivered through output interface 204.

While it should be understood that embodiments of the present inventionmay be applied with respect to producing an output document associatedwith the transformation of almost any structured document (e.g. adocument having a defined structure that can be used to interpret thecontent) whether the content of the original document is highlystructured (such as an XML document, Hypertext Markup Language (HTML)document, .pdf document, word processing document, database, etc.) orloosely structured (such as a plain text document whose structure maybe, e.g., a stream of characters), it may be useful to illustrate oneparticular embodiment of an output generator in conjunction with anarchitecture for transforming XML or other structured documentsutilizing a set of transformation instructions for the XML document(e.g. a stylesheet). While this illustration of such an exemplaryarchitecture uses one embodiment of an output generator such as thatdescribed herein it will be apparent that, as discussed above,embodiments of an output generator may be utilized in a wide variety ofother architectures and may be applied to generate output documents withor without the use of transformation instructions, preparsed data, etc.

Attention is now directed to an architecture for the efficienttransformation or processing of structured documents in which anembodiment of an output generator may be utilized. Embodiments of thearchitecture may comprise an embodiment of the aforementioned outputgenerator along with other logical components including a patternexpression processor, a transformation engine and a parser, one or moreof which may be implemented in hardware circuitry, for example ahardware processing device such as an Application Specific IntegratedCircuit (ASIC) which comprises all the above mentioned logicalcomponents, including the output generator.

More particularly, transformation instructions associated with astructured document may be compiled to generate instruction code and aset of data structures. The parser parses the structured documentassociated with the transformation instructions to generate datastructures representative of the structured document. The patternexpression processor (PEP) identifies data in the structured documentcorresponding to definitions in the transformation instructions. Thetransformation engine transforms the parsed document according to thetransformation instructions and the output generator assembles thistransformed data into an output document.

Turning to FIG. 3, a block diagram for the transformation of structureddocuments using embodiments of the present invention is depicted. Astructured document may be received at a web service 112 from a varietyof sources such as a file server, database, internet connection, etc.Additionally, a set of transformation instructions, for example an XSLTstylesheet, may also be received. Document processor 210 may apply thetransformation instructions to the structured document to generate anoutput document which may be returned to the requesting web service 112,which may, in turn, pass the output document to the requester.

In one embodiment, compiler 220, which may comprise software (i.e. aplurality of instructions) executed on one or more processors (e.g.distinct from document processor 210) may be used to compile thetransformation instructions to generate data structures and instructioncode in memory 270 for use by document processor 210. Document processor210 may be one or more ASICs operable to utilize the data structures andinstruction code generated by compiler 220 to generate an outputdocument.

FIG. 4 depicts a block diagram of one embodiment of an architecture fora document processor operable to produce an output document from astructured document. Document processor 210 comprises Host InterfaceUnit (HIU) 310, Parser 320, PEP 330, Transformation Engine (TE) 340,Output Generator (OG) 350, each of which is coupled to memory interface360, to Local Command Bus (LCB) 380 and, in some embodiments, to oneanother through signal lines or shared memory 270 (e.g. a source unitmay write information to be communicated to a destination unit to theshared memory and the destination unit may read the information from theshared memory), or both. Shared memory 270 may be any type of storageknown in the art, such as RAM, cache memory, hard-disk drives, tapedevices, etc.

HIU 310 may serve to couple document processor 210 to one or more hostprocessors (not shown). This coupling may be accomplished, for example,using a Peripheral Component Interconnect eXtended (PCI-X) bus. HIU 310also may provide an Applications Programming Interface (API) throughwhich document processor 210 can receive jobs. Additionally, HIU 310 mayinterface with LCB 380 such that various tasks associated with thesejobs may be communicated to components of document processor 210.

In one embodiment, these jobs may comprise context data, including astructured document, data structures, and instruction code generatedfrom transformation instructions by the compiler. Thus, the API mayallow the context data to be passed directly to HIU 310, or, in otherembodiments, may allow references to one or more locations in sharedmemory 270 where context data may be located to be provided to HIU 310.HIU 310 may maintain a table of the various jobs received through thisAPI and direct the processing of these jobs by document processor 210.By allowing multiple jobs to be maintained by HIU 310, these jobs may besubstantially simultaneously processed (e.g. processed in parallel) bydocument processor 210, allowing document processor 210 to be moreefficiently utilized (e.g. higher throughput of jobs and lower latency).

Parser 320 may receive and parse a structured document, identifying datain the structured document for PEP 330 and generating data structurescomprising data from the structured document by, for example, creatingdata structures in shared memory 270 for use by transformation engine340 or output generator 350.

PEP 330 receives data from parser 320 identifying data of the structureddocument being processed and compares data identified by the parser 320against expressions identified in the transformation instructions. PEP330 may also create one or more data structures in shared memory 270,where the data structures comprises a list of data in the structureddocument which match expressions.

Transformation engine 340 may access the data structures built by parser320 and PEP 330 and execute instruction code generated by compiler 220and stored in memory 270 to generate data for an output document. Insome embodiments, one or more instructions of the instruction codegenerated by compiler 220 may be operable to be independently executed(e.g. execution of one instruction does not depend directly on theresult of the output of the execution of another instruction), and thusexecution of the instruction code by transformation engine 340 may occurin substantially any order.

Output generator 350 may assemble the results generated bytransformation engine 340 in an order corresponding to an outputdocument to form one or more character streams corresponding to anoutput document. The output document may then be provided to theinitiating web service 112 through HIU 310, for example, by signalingthe web service 112 or a host processor that the job is complete andproviding a reference to a location in memory 270 where an outputdocument exists, or by streaming the output document as it is produced.

Moving now to FIG. 5, an example application of one embodiment of adocument processor to an XML document and an XSLT stylesheet isillustrated. It is noted that, while the description herein may includeexamples in which transformation instructions are applied to a singlesource document, other examples may include applying multiple sets oftransformation instructions to a source document (either concurrently orserially, as desired) or applying a set of transformation instructionsto multiple source documents (either concurrently with context switchingor serially, as desired). Generally, an XML document is a structureddocument which has a hierarchical tree structure, where the root of thetree identifies the document as a whole and each other node in thedocument is a descendent of the root. Various elements, attributes, anddocument content form the nodes of the tree. The elements define thestructure of the content that the elements contain. Each element has anelement name, and the element delimits content using a start tag and anend tag that each include the element name. An element may have otherelements as sub-elements, which may further define the structure of thecontent. Additionally, elements may include attributes (included in thestart tag, following the element name), which are name/value pairs thatprovide further information about the element or the structure of theelement content. XML documents may also include processing instructionsthat are to be passed to the application reading the XML document,comments, etc.

An XSLT stylesheet is a set of transformation instructions which may beviewed as a set of templates. Each template may include: (i) anexpression that identifies nodes in a document's tree structure; and(ii) a body that specifies a corresponding portion of an outputdocument's structure for nodes of the source document identified by theexpression. Applying a stylesheet to a source document may compriseattempting to find a matching template for one or more nodes in thesource document, and instantiating the structures corresponding to thebody of the matching template in an output document.

Again, while XSLT stylesheets may be used in one example herein oftransformation instructions, generally a “transformation instructions”may comprise any specification for transforming a source document to anoutput document, which may encompass, for example, statements indentedto identify data of the source document or statements for how totransform data of the source document. The source and output documentsmay be in the same language (e.g. the source and output documents may bedifferent XML vocabularies), or may differ (e.g. XML to pdf, etc.).

Referring still to FIG. 5, an XML document and an associated XSLstylesheet may be received by web service 112. Web service 112 mayinvoke embodiments of the present invention to transform the receiveddocument according to the received stylesheet. More specifically, in oneembodiment, compiler 220 may be used to compile the XSL stylesheet togenerate data structures and instruction code for use by documentprocessor 210. Compiler 220 may assign serial numbers to nodeidentifiers in the stylesheet so that expression evaluation may beperformed by document processor 210 by comparing numbers, rather thannode identifiers (which would involve character string comparisons).

Compiler 220 may also store a mapping of these node identifiers toserial numbers in one or more symbol tables 410 in memory 270.Additionally, compiler 220 may extract the expressions from thestylesheet and generate expression tree data structures in memory 270 tobe used by the document processor 210 for expression matching (e.g. oneor more parse-time expression trees 420 comprising expression nodes).Still further, compiler 220 may generate an instruction table 430 inmemory 270 with instructions to be executed for one or more matchingexpressions. The instructions in the instruction table 430, whenexecuted by document processor 210, may result in performing the actionsdefined when an expression associated with the instruction is matched.In some embodiments, the instructions may comprise the actions to beperformed (i.e. there may be a one-to-one correspondence betweeninstructions and actions). The compiler may also generate whitespacetables 440 defining how various types of whitespace in the sourcedocument are to be treated (e.g. preserved, stripped, etc.), anexpression list table 450, a template list table 460 and one or more DTDtables 462 to map entity references to values or specify default valuesfor attributes.

At this point, processing of the source document by document processor210 may begin. Parser 320 receives the structured document and accessesthe symbol tables 410, whitespace tables 440, or DTD tables 462 inmemory 470 to parse the structured document, identify document nodes,and generate events (e.g. to identify document nodes parsed from thedocument) to PEP 330. More particularly, parser 320 converts nodeidentifiers in the source document to corresponding serial numbers inthe symbol tables 410, and transmits these serial numbers as part of theevents to the PEP 330. Additionally, parser 320 may generate a parseddocument tree 470 representing the structure of the source document inmemory. Nodes of the parsed document tree may reference correspondingvalues stored in one or more parsed content tables 472 created in memoryby parser 320.

PEP 330 receives events from the parser 320 and compares identifieddocument nodes (e.g. based on their serial numbers) against parse-timeexpression tree(s) 420 in memory 270. Matching document nodes areidentified and recorded in template or expression match lists 480 inmemory 270.

Transformation engine 340 accesses the template or expression matchlists 480, the parsed document tree 470, the parsed content tables 472or the instruction table 430. The transformation engine 340 executesinstructions from the instruction table 430 in memory 270. Theseinstructions may be associated with one or more expressions.Transformation engine 340 may execute the instructions on each of thedocument nodes that matches the expression associated with theexpression. Transformation engine 340 may request the construction ofone or more output data structures from output generator 350 and sendcommands to output generator 350 requesting that data resulting from theexecution of these instructions be stored in one or more of these datastructures.

This may be illustrated more clearly with reference to FIGS. 6 and 7.FIG. 6 illustrates one embodiment of logical interfaces betweentransformation engine 340 and output generator 350. In one embodiment,transformation engine 340 may comprise a set of application engines, anevent generator, a hardware accelerator or other logical units which areoperable to process events associated with the transformationinstructions generated by compiler 220. The execution of these eventsmay result in a request to generate a data structure (e.g. output table)from a logical unit of the transformation engine to the output generator350, a request for a reference to a created data structure or a commandto place data in a location in a previously created data structure. Moreparticularly, in one embodiment, two buses 610, 620 may facilitatecommunications between transformation engine 340 and output generator350. Bus 610 is a bi-directional bus allowing logical components oftransformation engine 340 to send a request (e.g. command) that a datastructure be built by output generator 350, and that a reference, orhandle, to that data structure be returned to transformation engine 340.The request sent by a logical component of transformation engine 340may, in one embodiment, be received and processed by order control unit212.

Bus 620 may be a unidirectional bus which allows logical components oftransformation engine to send a request to output generator 350 to placedata in a previously created data structure. Requests received on bus620 may be placed at the tail of FIFO queue 622 associated with ordercontrol unit 212. Order control unit 212 obtains requests from the headof queue 622 and processes these requests.

The communications between transformation engine 340 and outputgenerator 350 may be better explained with reference to FIG. 7, whichillustrates one embodiment of these communications in more detail.Transformation engine 340, or logical units thereof, may send requests712 over bus 610, where request 712 is a request to build an outputtable with a number of entries and return a handle to that output table.Order control unit 212 of output generator 350 may receive request 712,create the requested output table in memory 270 and return a reference(e.g. memory handle) to that output table to transformation engine 340through communication 714 over bus 610.

During the execution of transformation instructions, then, if a logicalcomponent of transformation engine 340 desires to put data correspondingto an output document (which may be referred to as output document data)in an entry of an output table in memory 270, the logical component mayissue one or more requests 722 over bus 620 to output generator 350,where the request includes a reference to a particular output table, andthe data to place in the output table. As mentioned with respect to FIG.6, these requests 722 may be placed in FIFO queue 622 associated withorder control unit 212, which may process these requests in the orderthey are placed in queue 622.

In one embodiment, requests 712 and 722 may serve to create and filloutput tables associate with one or more output documents. For eachoutput document at least one root table may be requested bytransformation engine 340 and allocated by output generator 350, wherethe root table comprises a number of entries.

Linked tables for each output document may also be constructed insubstantially the same manner, and these link tables linked to entriesin the root table such that the entry in the root table references thelinked table and the linked table references the next entry (i.e. theentry in the root table following the entry that references the linkedtable) in the root table (tables linked to the root table may bereferred to as first level linked tables). It will be apparent thatlinked tables may, in turn, be linked to other linked tables to create aset of linked data structures of arbitrary depth, for example a linktable (e.g. second level linked table) may be created and linked to anentry in a first level linked table, such that the entry in the firstlevel table references the linked table and the second level linkedtable references the next entry in the first level table.

Thus, by initiating the construction of various output tables, andsetting the entries of the various output tables, including the linkingof the various output tables, transformation engine 340 may serve tocreate a set of linked output tables in memory 270 representative of thestructure of an output document and comprising, or referencing, at leasta portion of the content comprising the output document throughcommunication with order control unit 212. However, from the abovedescription it may be gleaned that these output tables may be neithercreated nor filled in the same order as the output document (i.e. databelonging at one place in the output document may be placed in an outputtable before data which will come before it in the document, etc.).

To allow an output document to be output in a streaming manner, as dataassociated with the output document is generated by transformationengine 340, order control unit 212 may generate a walk event to outputwalker 222 to indicate an output table or entry in an output table hasbeen completed. This command may be placed in a walk event queue foroutput walker 222, which obtains walk events from this queue. Outputwalker 222 may traverse data structures in memory 270 based on thesewalk events to assemble commands to value extractor 232 operable produceoutput which may be in an order corresponding with the order of adesired output document. When output walker 222 has reached the end theoutput tables for a particular output document, output walker 222 maysend a command signaling that an output document has been completed.

FIG. 8 illustrates one embodiment of order control unit 212 and outputwalker 222 in more detail. Order control unit 212 creates, maintains andupdates output data structures 810 in memory 270, as discussed above. Asorder control unit 212 generates and updates output data structures 810,order control unit 212 may generate events to output walker 222identifying content of an output document which is available to beoutput, where previous data of that output document may have been (ormay be in the process of) being output, and place these events in aqueue for output walker 222. This event may identify an entry in anoutput data structures 810, such that output walker 222 can begintraversing the output data structures 810 at that entry, and continuetraversing the output data structures 810 until entries, tables or otherstructures 810 which have not been completed or filled are encountered.The next walk event can then be obtained from the walk event queue.

In one embodiment, a set of bits may be associated with entries in eachoutput data structure 810 and are utilized by order control unit 212 andoutput walker 214 to generate events and traverse the output tables 810.This may be better explained with reference to FIG. 9 which depicts oneembodiment of one output data structure 810. Output structure 810 a maybe a table comprising a set of entries 920, each entry having two bits912, 914. One bit 912 is a valid bit which may be set when a value ofthe entry is written. Progress bit 914 may be set by either ordercontrol unit 212 or output walker 222. When a valid bit 912 or progressbit 914 of an entry is written by order control unit both bits may bechecked and, if both nits 912, 914 are set, a walk event may begenerated to output walker 222.

Output walker 222 may then traverse output tables 810 a starting withthe entry 920 referenced in the walk event. Thus, the entry 920 maycomprise a value, which may be a reference to another memory location ina previously created data structure 820 (e.g. a data structure createdby compiler 220, parser 320 or PEP 330) comprising content from acorresponding original structured document. Output walker 222 may outputa command and an associated reference corresponding to this entry 920 tovalue extractor 232, at which point output walker 222 may set theprogress bit 914 of the next entry 920 in the output structure 810 a,and, if both bits associated with the entry 920 are set, this entry 920may too be processed by output walker 222. If both bits 912, 914 of thenext entry 920 are not set, output walker 222 may obtain the next walkevent from the walk event queue.

After reading the above description, however, it may be noticed that anentry of an output structure 810 may reference another output structure810 (and entries in this linked output structure 810 may, in turn,reference another linked output structure 810, etc.). Thus, if thereferenced entry is a link to another output structure 810, outputwalker 222 may process each of the entries of the linked outputstructure 810 (and any output structures 810 referenced by entries inthis linked output structure 810, etc.) until each output structure 810at a deeper level than the original entry has been processed. Duringthis traversal process, output walker 222 may output one or morecommands and associated references corresponding to the traversedentries in output structures 810.

A particular example may be helpful in illustrating the use of progressbit 914 and valid bit 912. In one embodiment, when output structure 810a is created by order control unit 212, progress bit 914 and valid bit912 corresponding with each entry may be cleared. Initially, ordercontrol unit 212 may receive a command from transformation engine 340 toset entry 920 a to a value. Order control unit 212 may set the value ofentry 920 a, valid bit 914 a and progress bit 912 a, and since bothvalid bit 912 a and progress bit 914 a are set, a walk event with theaddress of entry 920 a may be sent to output walker 222. Output walker222 may receive this walk event and process entry 920 a, after whichoutput walker 222 may then set progress bit 914 b associated with entry920 b. However, as valid bit 912 b is not yet set, output walker 222 maystop traversing output table 810 a and obtain another walk event fromthe walk event queue.

Suppose now that order control unit 212 sets the value of entries 920 band 920 c (setting valid bits 912 b and 912 c associated with entries920 b and 920 c). Order control unit 212 may then check both valid bit912 b and progress bit 914 b, and since both valid bit 912 b andprogress bit 914 b are set, a walk event with the address of entry 920 bmay be sent to output walker 222. Output walker 222 may receive thiswalk event and process entry 920 b, after which output walker may thenset progress bit 914 c associated with entry 920 c. Output walker 22 maythen checks progress bit 914 c and valid bit 912 c, determine that entry920 c should be processed, and process entry 920 c. While the abovedescription illustrates one way to order the processing of outputstructures 810 by output walker 222, it will be apparent that othermethod of ordering entries in output structures 810 may be utilized,such as a registers which hold address of entries 920 of outputstructures 810, etc.

When processing each of the entries 920 of an output structure 810,output walker 222 may generate a command for value extractor 232 basedon the entry 920 being processed. In one embodiment, a command for valueextractor 232 comprises an opcode derived from a type of entry 920 inthe output table 810 being processed and associated information for theentry 920. This associated information may comprise one or more memoryreferences or serial numbers which correspond to values associated withthat entry. In certain cases, a memory reference or serial numbercorresponding to a value may be obtained by accessing one or more datastructures 820 previously created in memory 270 during previousprocessing of the original structured document, or the transformation ofthe structured document (e.g. by parser 320, PEP 330, TE 340, etc.).These data structures 820, may include a data structure 820 representingthe structure of the original document and comprising values, or serialnumbers corresponding to values, of the original structured document ordata structures 820 comprising a list of nodes of the original document,etc., as discussed previously. To obtain data from these data structures820, output walker 222 may traverse one or more of these data structures820 in conjunction with the processing of an entry 920 of an outputstructure 810. This traversal of data structures 820 may be done by alogical unit, which may be separate hardware circuitry, of output walker222.

In one embodiment, while processing entries 920 in output structures810, output walker 222 may encounter an entry which corresponds to amessage associated with a set of transformation instructions associatedwith the output document being assembled. This message may correspond toa transformation instruction indicating that a message is to begenerated to the output during the transformation or processing of astructured document (e.g. xsl:message command). If output walker 222encounters such an entry 920 in an output table, and this entry has beencompleted, output walker 222 may substantially immediately transmit acommand to value extractor 232 to output the host bound message to HIU310 before continuing traversal of data structures 810.

Value extractor 232 may receive commands from output walker 222, andbased on these commands may retrieve content or values from memory 270and generate text output corresponding with an output document. Moreparticularly, value extractor 232 may utilize the reference or serialnumber associated with a command received from output walker, and mayaccess data structures in memory 270 utilizing the reference or serialnumber to obtain a character string (e.g. the value of content for theoutput document). Markup may be added to the character string (i.e.appended to the character string, prepended to the character string orinserted within the character string) depending on the desired type ofthe output document, indention of the character stream accomplished,attributes of the output document merged and the resulting characterstring streamed or output to output formatter 242.

A block diagram for one embodiment of value extractor 232 is depicted inFIG. 10. As detailed above, value extractor 232 may receive commandsfrom output walker 222 and produce a stream of characters for an outputdocument to output formatter 242. In one embodiment, value extractor 232comprises a five stage logic pipeline, which may be implementedsubstantially completely in hardware. This logic pipeline may comprisedecode stage 1010, compare stage 1020, update stage 1030, fetch stage1040, markup stage 1050 and, in one embodiment, comprise control logic1060 operable to control the operation of stages 1010, 1020, 1030, 1040,1050. Each stage 1010, 1020, 1030, 1040, 1050 may perform specificfunctions to ensure that well-formed constraints corresponding with atype of the output document (e.g. XML, HTML, text, etc.) are placed onthe character stream generated for the output document. In case ofviolations the transformation may either terminated or continuesdepending upon severity of the error. In one embodiment, an error may bereported to HIU 310 in such a situation.

More particularly, decode stage 1010 retrieves commands from a queuewhere commands from output walker 222 may have been placed. Thesecommands may comprise an opcode and, in some cases, associated data suchas the type of the output document, reference to a location in memory270, or a serial number corresponding with a value in a data structure820 in memory 270. The decode stage 1010 may initiate a fetch of a valueassociated with the memory reference or serial number and generate acommand for fetch stage 1040. Additionally, decode stage 1010 maygenerate a command for compare stage 1020 to obtain data to identify ifa particular node is unique, or a command for update stage 1030 toupdate one or more data structures in memory 270. Decode stage 1010 maystall until results are received in response commands issued to comparestage 1020.

Compare stage 1020 may be operable to identify unique or duplicate nodesin the output document or merge attributes in the output document basedon commands received from decode stage 1010. These nodes may includeelements, attributes, namespaces, etc. To determine if a node is aduplicate or unique, references or serial numbers associated with nodesmay be compared with data structures 820 or 1032 corresponding to thenode type being analyzed. The results of the comparison performed bycompare stage 1020 may be passed to the update stage 1030 or the decodestage 1010. More particularly, compare stage may generate set matchflags (indicating if a node has been matched) for decode stage 1010 andsend commands to update stage 1030.

Update stage 1030 maintains stacks and tables or other data structures1032 used during output generation. It may interface directly with thesememory structures 1032 in memory 270 (e.g. through memory interface 360)and may provide busy indication when a data structure 1032 which isbeing accessed is being fetched or updated. More particularly, updatestage 1030 may maintain stacks and tables 1032 which may be used forproviding output scope context, generate namespace declarations, andeliminate attribute duplication. These stacks may include an ElementDeclaration Stack (EDS), an Output Namespace Scope Stack (ONSS), and aExclude Results Prefix Stack (ERPS). The tables may include an AttributeMerge Content Table (AMCT). The EDS may be used for determining thescope range of a namespace. The ONSS is used to determine whether a newnamespace has been declared in the current scope or within an ancestorelement. The AMCT is used to remove duplicate attribute names andattribute set names.

Fetch stage 1040 receives commands from decode stage 1010. Thesecommands may comprise an opcode, the type of the output document and apointer or reference to a location in memory. Based on the opcode of acommand, fetch stage 1040 may retrieve the character strings that makeup node names, targets, values, text etc. using pointer values suppliedin the command, and insert markup characters for the appropriate type ofthe output document. Pointers associated with a received command mayreference data structures 810 or 820 comprising a set of tables with thecharacter strings comprising content of the output document, and fetchstage 1040 may obtain these character strings by accessing these datastructures 810, 820 using the reference associated with the command.Fetch stage 1040 may insert markup characters into, or append or prependmarkup characters to, the character string, before sending the characterstring to markup stage 1050.

Markup stage 1050 may receive the character string and check to makesure the character string is well-formed according to a standard for thetype of the output document, and, if possible, fix any errors detected.If it is not possible to fix the error according to a standard for thetype of the output document markup stage 1050 may generate an error, forexample to HIU 310. Markup stage 1050 may also perform output escapingand character escaping on this character stream, based on the type ofthe output document associated with this character stream.

Thus, a character stream comprising at least a portion of an outputdocument, corresponding to a command received at value extractor 233 andcorresponding to a type of the output document (e.g. HTML, XML, text,etc.) may be produced by stages 1010, 1020, 1030, 1040, 1050. Beforeoutputting this character stream, however, value extractor 232 (e.g.markup stage 1050) may perform output escaping and character escaping onthis character stream, based on the type of the output documentassociated with this character stream. As the escaping is performed,then, this stream of characters may be delivered to output formatter.

FIG. 11 depicts one embodiment of value extractor 232 and outputformatter 242. As just discussed, value extractor 232 produces a streamof characters associated with a particular output document to outputformatter 242. Output formatter 242 may receive this stream ofcharacters and convert each of these received characters from aninternal format or representation (e.g. UTF-16) to an encoding desiredfor the characters of an output document (e.g. UTF-8, WIN 1252, BigEndian, Little Endian, etc.).

As the characters in the character stream are converted then, outputgenerator 350 may return each of the characters to the HIU 310 byassociating each of the characters with a job ID and placing thecharacter in a FIFO queue for reception by HIU 310. Thus, output streamscomprising portions of different output documents may be interleaved inthe output of output generator 350. In other words, a portion of oneoutput document may be output by output generator 350 followed by aportion of another output document before, followed by another portionof the first output document. By tagging each of the output streams witha job ID, HIU 310 may receive these various portions of output documentsand assemble the portions into a complete output document, or may outputeach portion of an output document to a host substantially as itarrives.

In the foregoing specification, the invention has been described withreference to specific embodiments. However, one of ordinary skill in theart appreciates that various modifications and changes can be madewithout departing from the scope of the invention as set forth in theclaims below. Accordingly, the specification and figures are to beregarded in an illustrative rather than a restrictive sense, and allsuch modifications are intended to be included within the scope ofinvention. For example, it will be apparent to those of skill in the artthat although the present invention has been described with respect to aprotocol controller in a routing device the inventions and methodologiesdescribed herein may be applied in any context which requires thedetermination of the protocol of a bit stream.

Benefits, other advantages, and solutions to problems have beendescribed above with regard to specific embodiments. However, thebenefits, advantages, solutions to problems, and any component(s) thatmay cause any benefit, advantage, or solution to occur or become morepronounced are not to be construed as a critical, required, or essentialfeature or component of any or all the claims.

1. An apparatus, comprising a hardware circuit operable to determinethat a first output document data stored in an output data structureassociated with the first output document is in an order correspondingto the first output document, obtain content for the first outputdocument based on the first output document data, format the contentaccording to a type of the first output document and generate outputcomprising the content, wherein the output comprises a first portion ofthe output document.
 2. The apparatus of claim 1, wherein the hardwarecircuit comprises an order control circuit operable to create the outputdata structure associated with the first output document, receive thefirst output document data to be stored in the output data structure andstore the first output document data to the output data structure,wherein the first output document data may be received in an orderdiffering from an order corresponding to the first output document. 3.The apparatus of claim 2, comprising an output walker circuit operableto traverse the first output document and obtain a reference related tothe first output document data based on a walk event command from theorder control unit.
 4. The apparatus of claim 3, comprising a valueextractor circuit operable to obtain content for the first outputdocument based on a command from the output walker circuit, wherein thefirst command comprises the reference and format the content accordingto a type of the first output document to generate a set of characterscomprising the first portion of the output document.
 5. The apparatus ofclaim 4, comprising an output formatter circuit operable to encode theset of characters according to an encoding scheme.
 6. The apparatus ofclaim 5, wherein the hardware circuit is operable to generate outputcomprising a second portion of the first output document and a firstportion of a second output document, wherein the first document isdistinct from the second document and the output comprising the firstportion of the second output document is generated after the outputcomprising the first portion of the first document and before the secondportion of the first document.
 7. The apparatus of claim 6, wherein thetype of the first output document and a type of the second outputdocument is extensible Markup Language (XML), Hyper Text Markup Language(HTML) or text.
 8. The apparatus of claim 7, wherein the type of thefirst output document is different from the type of the second outputdocument.
 9. A system, comprising: a transformation engine circuitoperable to generate first output document data associated with a firstoutput document, where the first output document data is generated in anorder that does not correspond to an order for the first output documentand store this first output document data to a first output datastructure associated with the first output document; and an outputgenerator circuit operable to determine that the first output documentdata stored in the first output data structure associated with the firstoutput document is in an order corresponding to the first outputdocument, obtain content for the first output document based on thefirst output document data, format the content according to a type ofthe first output document and generate output comprising the content,wherein the output comprises a first portion of the first outputdocument.
 10. The system of claim 9, wherein the transformation enginecircuit is operable to send a first command and a second command to theoutput generator circuit, wherein the first command is executable tocreate the first output data structure and the second command isexecutable to store the first output document data to the first outputdata structure and the output generator circuit is operable to executethe first command and second command to create the first output datastructure associated with the first output document and store the firstoutput document data to the first output data structure.
 11. The systemof claim 9, wherein the transformation engine circuit is operable togenerate second output document data associated with a second outputdocument which does not correspond to an order for the second outputdocument and store this second output document data to a second outputdata structure associated with a second output document; and the outputgenerator circuit is operable to generate output comprising a firstportion of the second output document.
 12. The system of claim 11,wherein the transformation engine circuit is operable to generate thirdoutput document data associated with the first output document and storethis third output document data to the first output data structureassociated with the first output document and the output generatorcircuit is operable to generate output comprising a second portion ofthe first output document.
 13. The system of claim 12, wherein the firstoutput document is distinct from the second output document.
 14. Thesystem of claim 13, wherein the output comprising the first portion ofthe second output document is generated after the output comprising thefirst portion of the first output document and before the outputcomprising the second portion of the first output document.
 15. Thesystem of claim 14, wherein the type of the first output document and atype of the second output document is XML, HTML or text.
 16. The systemof claim 15, wherein the type of the first output document is differentfrom the type of the second output document.
 17. A method, comprising:in an output generator circuit, determining that first output documentdata associated with a first output document in a first output datastructure associated with a first output document is in an ordercorresponding to the first output document; generating content based onthe first output document data; formatting the content according to atype of the first output document; and generating output comprising thecontent, wherein the output comprises a first portion of the firstoutput document.
 18. The method of claim 17, comprising receiving thefirst output document data, wherein the first output document data isreceived in an order that does not correspond to the order for the firstoutput document; and storing this first output document data to thefirst output document data structure.
 19. The method of claim 17,comprising: generating content based on second output document data;formatting the content according to a type of the second outputdocument; and generating output comprising the content, wherein theoutput comprises a first portion of a second output document.
 20. Themethod of claim 19, wherein the second output document is distinct fromthe first output document.
 21. The method of claim 20, comprisinggenerating output comprising a third portion of a second document,wherein the output comprising the first portion of the second outputdocument is generated after the output comprising the first portion ofthe first output document and before the output comprising the secondportion of the first output document.
 22. The method of claim 21,wherein the type of the first output document and a type of the secondoutput document is XML, HTML or text.
 23. The method of claim 22,wherein the type of the first output document is different from the typeof the second output document.